The phase transition in random catalytic sets 
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The notion of (auto) catalytic networks has become a cornerstone in understanding the possibility 
of a sudden dramatic increase of diversity in biological evolution as well as in the evolution of 
social and economical systems. Here we study catalytic random networks with respect to the final 
outcome diversity of products. We show that an analytical treatment of this longstanding problem 
is possible by mapping the problem onto a set of non-linear recurrence equations. The solution of 
these equations show a crucial dependence of the final number of products on the initial number 
of products and the density of catalytic production rules. For a fixed density of rules we can 
demonstrate the existence of a phase transition from a practically unpopulated regime to a fully 
populated and diverse one. The order parameter is the number of final products. We are able to 
further understand the origin of this phase transition as a crossover from one set of solutions from 
a quadratic equation to the other. 

PACS numbers: 87.10.+e, 89.75.-k, 02.10.Ox, 05.70.Ln, 89.75.Hc, 
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I. INTRODUCTION 

Chemicals act on chemicals to produce new chemicals, 
goods act on goods to produce new goods, ideas act on 
ideas to produce new ideas. The concept that elements 
of a set act on other elements of the same set to pro- 
duce new elements which then become part of this set 
is ubiquitous not only in nature but also in social sys- 
tems and processes. We might think of the development 
of modern chemistry, where the invention of a new com- 
pound leads to possibilities to use this compound (as a 
catalyst) to produce yet another compound. The same is 
true for economy, where one newly introduced good can 
be used as a tool to produce new goods and tools. Other 
famous example are recent models of evolution, maybe 
even our whole concept of history in general can be seen 
as a process of this type. 

Maybe the most fascinating question associated with 
these processes is under which conditions self-sustaining 
systems can emerge, i.e. that the newly produced chemi- 
cals, goods, or ideas find adequate other new or old chem- 
icals, goods, and ideas such that they can act on each 
other to produce yet new products, and so on. For any 
scientific approach to this subject it is clear that it is nec- 
essary to specify rules, which product can act on another 
product to produce a third one. For example there is 
a chemical rule that oxygen and hydrogen will produce 
water but there is no rule that gold and helium can form 
a compound. There is a rule that a hammer acting on a 
block of iron will produce sheet metal, but no rule that 
welding together two blocks of uranium 238 will lead to 
a big block of uranium. If one imagines that all existing 



'Electronic address: thurner@univie.ac .at 



and non-existing - but possible - products are listed in 
a high dimensional vector, than the rules how those ele- 
ments can act on each other can be thought of elements 
in an interaction matrix, where a zero entry means no 
interaction, and a non-zero element gives the interaction 
strength. 

In history there have been plenty of instances where 
a system of the above type underwent a transition from 
a state with very few products to a state of vast abun- 
dance of products. These transitions happen over rela- 
tively short timescales. An example from biology is the 
Cambrian explosion Q , where an unprecedented number 
of new taxa emerged within very short timescales. An 
economical example is the industrial revolution, where 
the number of industrial goods exploded to previously 
unimaginable numbers, a social example is the explosion 
of culture with the advent of the renaissance, or a more 
modern example the explosion of the number chemical 
compounds in the last century. 

All of these 'explosive' processes share the same struc- 
ture: There are possibly constructively interacting ele- 
ments, interaction is governed by a set of rules (natural 
laws, social consensus, religious restrictions), and a num- 
ber of initially existing products. What does trigger the 
explosive event, why is this event sometimes missing for 
very long timeperiods? The only parameters at hand are: 
the number of rules, the number of initially present ele- 
ments, and a possible structure in the interaction matrix. 
It is likely that the cultural explosion was driven by an 
increase of rules which was possible by driving back reli- 
gious restrictions. The explosion of chemical compounds 
was possible by discovering the rules of modern chem- 
istry. Let us mention here that for well studied systems 
the set of rules can in principle be known completely. 
However, for large systems, it might be wise as a first 
step to model them stochastically, i.e. let the interaction 
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matrix be a random matrix. In this case there will of 
course be no structure in the interaction matrix, and the 
only parameters will be rule density r and initial number 
ao of existing elements. 

Since quite some time there has been conjectured the 
existence of a phase transition in the above systems, e.g. 
in 0- By this we mean that in the r — oq plane there 
exist well defined regions, which are practically unpopu- 
lated, or almost fully populated, with a sharp transition 
between the regimes. This means that fixing the num- 
ber of initial elements there exists a critical density of 
rules r cr it . When the system is below r cr a the number of 
elements will remain relatively low compared to the to- 
tal number of possible products. Above r cr i t the system 
will become self-sustaining and drive towards a heavyly 
populated state. 

Even though this setting seems to be fundamental to 
a variety of disciplines, and its importance has long been 
noticed 0,0, the progress of a systematic scientific treat- 
ment of these problems is limited. Relevant contribu- 
tions to this field come from chemistry and biology. The 
adequate mathematical treatment of such processes are 
so-called catalytic equations, which are sets of coupled, 
quadratic, ordinary differential equations. Special cases 
of those equations are for example the class of Lotka- 
Volterra replicators, the hypercycle [5j, or more recent 
ideas like the Turing gas |(|. Replicator dynamics, which 
is linear, is obtained from catalytic dynamics by a proper 
scaling of time. It is needless to mention that the non- 
linear catalytic equations carry tremendous potential of 
complex dynamics, however, in earlier studies some ro- 
bustness in terms of fixed points seems to have been ob- 
served. For more details see Q. 

The aim of the present work is to prove the existence 
and study details of the nature of the above mentioned 
phase transition. This is possible by introducing some 
concepts of set theory, and - making use of the random 
structure of the production rules - by mapping the size of 
a consecutive series of sets into a set of update equations, 
which can be solved and analyzed. The analytic formu- 
las give insights into what is happening at the transition. 
A practical aspect of this present work is that we cover 
analytically the large system limit, which is beyond nu- 
merical reachability, and has sofar not been possible to 
study. 

The intuition for this work is based on a bit-string 
formulation of the above problem, which is a generaliza- 
tion of models recently termed random grammars Q . In 
the bit-string model there is a set of initial bit-strings 
and a set of strings which act on these strings by either 
combining strings or substituting sub-strings. These lat- 
ter strings can be seen as catalysts, their existence con- 
stitutes the presence of rules of what can be combined 
and/or substituted. The model which is presented below 
can be shown to be one to one compatible with the substi- 
tution and combination rules in the Kauffman bit-string 
model. 

The paper is structured as follows: Section 2, where 



the hard working authors state the formal problem and 
develop the necessary notation and concepts. Section 3, 
where the brave authors derive the set of update equa- 
tions. Section 4, where the struggling authors succeed 
in solving these equations reasonably well, and Section 
5, where the victorious authors present their results. In 
Section 6 we discuss the results. 



II. THE NETWORK-EQUATION AND SOME 
NOTATION 

Our basic concept is to view the abundances of all 
possible products as entries in a d-dimcnsional time- 
dependent vector x(t), i.e. Xi(t) is the quantity of prod- 
uct i present at time t. We drop the time notation in the 
following. The population dynamics of a system, where 
product i under the influence of product j produces prod- 
uct k is given by network-equations of the following type 

x k = {x,a k x) - x fe $ , $ = ^2(x,a k x) . (1) 

k 

with k £ A, and the component a k j > 0. A is the domain 
of nodes - or put in fancier terms - the index set of the 
dynamic process. (, ) is the inner product of statevector 
x with dimensionality d — |A|, the number of all possible 
products or nodes. The statevector is a vector of relative 
frequencies < Xk < 1, and J^ fc Xk = 1. Each a k is a lin- 
ear operator providing the quadratic forms (.,a .) and 
encodes the structural information about which binary 
combinations E A 2 , the substrates, can interact in 
order to form product k. Each entry in a k is a real num- 
ber, however, for the sake of simplicity we consider only 1 
for interaction or for no interaction. a k can be thought 
of as the set of rules how objects interact. In the following 
a k is sampled as a random matrix in the following way: 
We assume that for each product k G A there are pairs of 

substrate (L(k),M(k)) e A 2 such that L(k) {k}. In 
words, this arrow means: k is produced by substrate L(k) 
under the 'influence' of substrate M(k). Note, that L(k) 
and M{k) do not have to be unique, there can be more 
than one pair (L(k),M(k)) producing a specific product 
k. Lets call the number of pairs leading to product k, 
J^L,M(k)- We define the production rule density as the 
average number of pairs leading to one product 

r/2=(Af LtM (k))k ■ (2) 

Equations like Q are long known for their rather sur- 
prising robustness in terms of fixed points where the sys- 
tem converges to. This is not obvious and one would 
rather expect a situation more dominated by more com- 
plicated orbits and limit cycles. For a more detailed dis- 
cussion of this type of equations see Q. Fixed points 
will therefore provide much information about the effec- 
tive behavior of such systems, the fixed point equation 
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being 

x* k = ±;(x*,a k x*) . (3) 

We are dealing with a nonlinear process, so that the so- 
lutions of the network-equation will in general depend 
strongly on the initial conditions. Even assuming the 
process to be driven towards a stable fixed point does 
not imply the uniqueness of that fixed point. 

A. Notation 

We want to get some feeling what to expect when inves- 
tigating the population dynamics of randomly sampled 
fixed 0/1-networks. The dynamical properties of Eq. 
are linked to specific topologies. Knowing the topologic 
features will enable one to solve for the dynamics. Not 
knowing the interaction tensor a exactly, and only given 
its statistical properties, following the classical concept of 
statistical physics, one still can understand the expected 
dynamics of the system, and in particular its expected 
final outcome. We hence study the system in a proba- 
bilistic fashion, i.e. averaging over all possible topologies. 
The choices for (L,M)(k) are equally distributed and L 
and M can be seen as independent random variables 

The question we try to answer is: with how many prod- 
ucts do we expect to end up with when starting from 
a given number of randomly choosen initial substrate 
species? Before we proceed we need a number of defi- 
nitions. We denote the number of elements contained in 
some set A by writing |j4|, and define the support of a 
process at a given time 

S(x) := {k\x k > 0} . (4) 

Suppose that the process is driven to a (stable) fixed 
point such that the final support is 

S*(x) := S( lim x(t)) . (5) 

t — »oo 

Inversely, we can ask which initial conditions end up in 
the same fixed point ^(j an d define the body of this fixed 
point to be the set 

B(x*) := jzl^Zfc = 1 Afim x(t) =a;*| . (6) 

We further define the following operations on product- 
sets: the foreward difference of set A, 

d+A:={keA\{L(k),M(k)}cA}\A (7) 

and the backward difference of A, 

d-A:=U keA {L(k),M(k)}\A . (8) 

With these operations we can define the foreward closure 
of set A 

A+ := C\{B\A C B Ad+B C B} , (9) 



and the backward closure of a set, 

A~ := D{B\A C B Ad-B C B} . (10) 

These definitions can best be understood by viewing our 
dynamical system Eq. as a directed graph whose 
nodes are connected by arrows. A node looking at its 
edges can therefore distinguish between feather ends or 
arrow heads. A node holding a feather end of an arrow 
is a substrate, a node holding an arrow head is a product 
with respect to this edge. Given some set of nodes we 
can identify for each node k all the nodes which will be 
formed due to the influence of k in the next timestep. 
In this sense we say the nodes look 'foreward'. The fore- 
ward difference of some set is therefore the set of all prod- 
ucts the nodes of the set look foreward to, excluded the 
products that are already present as nodes of the set we 
started out with. The backward difference follows the 
same idea only looking at the arrows from arrowhead to- 
wards the feathers. The corresponding closures are then 
simply obtained by adding the set-differences iteratively 
to the initial set, i.e. we add the difference to the ini- 
tial set and form a new difference on the first extension 
of the set, then add this difference to obtain the second 
extension of the initial set and so on, until the difference 
(forward or backward) is empty and the iterative process 
comes to a halt. That this eventually may happen can 
be understood by looking at chains of arrows as an ex- 
ample. Take a node and add an arrow pointing at some 
other randomly chosen node in the set. In the beginning 
the chance to select a virginal node is high and the chain 
will grow but as the chain grows the probability of sam- 
pling a member already in the chain increases, and even 
though the single sample probability may still be small, 
the chance to sample into the chain eventually is not. 

For clarity let us summarize our philosophy: All ob- 
jects which can possibly act on each other are repre- 
sented by the index set A of the processing system Eq. 
||TJ , which contains all the 'names' of the considered and 
'thinkable' objects. The index set provides the domain 
for all dynamical considerations. Properties of the system 
are implemented via the map x : A — > R + , which are the 
relative frequencies of the indexed species in the domain. 
We are not interested in particular weights but only in the 
directed network topology coded by the matrices a k on 
the domain A. Not only does this topological approach 
provide us with the means to talk about randomly dis- 
tributed productive substrate pairs, but also about their 
density of occurrence r. Even more important, if we are 
not interested in details of the dynamics but decide to fo- 
cus only on how large the expected final set will be (given 
some initial substrate set and density) we can drop dy- 
namical considerations and pass to topological ones. We 
are not interested in how much of each object species we 
will effectively end up with, just if it got produced or not. 
The subset of the domain that is effectively populated is 
called the support S C A. To investigate the flow of 
the support under the network equation we may utilize 
set-operations compatible with the topological structure 
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of directed reaction graphs leading to the definition of 
the forward and the backward difference and their re- 
spective closures. It is intuitively clear, that the forward 
closure of some set is the subset of the domain that is 
flooded by the initial set during the production process. 
It necessarily forms an upper limit for the size of possi- 
ble self-sustainable subsets of the domain reachable from 
the initial condition. In a simplification considering an 
equally distributed random interaction topology we gain 
a notion of expected growth rates for the set-differences 
based on expected sampling rates. This leads to equa- 
tions of expected growth as demonstrated next. 



III. THE GROWTH LAWS FOR CATALYTIC 
SETS 

We now develop a method to compute the size of the 
foreward closure — \A + \ as a function of the produc- 
tion rule density and the size of the initial set A. 
is the final number of products once the system has con- 
verged. The probability of some fixed k being the product 
of some fixed pair of substrates, I and m is obviously 

Imagine an initial random set Aq of products with a 
number of ao — \Ao\ elements. Given the dimension- 
ality d of the system and a production rule density r, 
the expected number of produced elements in the next 
timestep is the number of possible pairs in Aq times the 
probability to find a productive pair, i.e. 

rd ( a \ ra(a - 1) ra 2 

T (, 2 J * = "2(d = l)- ~ M ■ (12) 

Several of these newly created elements will already be € 
Aq. The probability that one of these produced elements 
is not yet G A a is 1 — a a /d, leading to the actual size of 
the catalytic set at time 1, Ax 

ai = a Q + Aa Q with Aa = ^ (l - a 2 , (13) 

with Aao = |<9+Ao| being the increment of elements. 
What will happen in the next timestep? We now have 
Ai = A U d + A n . The increment for the next timestep 
will be made up of all the new products which are possi- 
ble (and not already in A\) by combining two elements 
from 8+Aq, or by combining one element from d+Ao with 
one from Aq. The number of possible pairs for those 
combinations are Aao(Aao — l)/2 ~ Aa^/2 and do Aao, 
respectively. Note, that the third possibility combining 2 
elements from the set Aq will always lead to 8+Aq which 
is already € A\, and no new products can emerge from 
this. These possibilities multiplied by p, rd/2 and prob- 
ability that the new product lies in A\ already, lead to 
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FIG. 1: Solution to the update equations Eq. I15H (a) and 
its analytical approximation Eq. i1'2'2fi (b). There is clearly 
a critical line in the r — ao plane where a transition from an 
unpopulated to an almost fully populated regime occurs. The 
spikes in (b) are due to numerical problems in the plot with 
poles in the solution a^. 

the increment for the second timestep 

Aai = ^ (l - ^) (Aa 2 + 2a Aa ) 

-a^M-® ' <14) 

Now continue the iterative sceme A t +i = A t Ud+A t . This 
means that the set at time t + 1 is the old set plus the 
products newly generated in the timespan [t,t + 1]. We 
can finally write down a growth equation for catalytic 
sets 

at+i = a t + Aa t , Aa t+ i = — [I — j (a t+1 - a t ) , 

"" (15) 
with initial conditions a = |A | and a_i — 0. To sum 

up, we have to take care of all possible new pairs by 

looking at all new elements added in an iteration step 
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and the new pairs they can build with themselves. We 
also have to take into account all the pairs they can build 
with elements produced earlier. We have to exclude all 
the pairs that allready have been considered. This is 
all captured by Eq. (|15fl . which should be noted to be 
scale invariant with respect to dimension d. To see this 
just scale a — > a/d, and the dimension drops out of the 
equation. It is therefore fully justified to drop d from Eq. 
I|15[l. if wanted. 



IV. SOLUTION OF THE UPDATE EQUATION - 
RESULT 

The solution of the growth equation Eq. H15|l with 
respect to the productive pair density r and the initial 
set-size ao is given in Fig. H(a). The immediate message 
is that there is a critical density r cr i t above which a con- 
tinuous increase of the initial set-size ao does not corre- 
spond with a continuous increase of its foreward closure 
size (Zoo, but displays a phase transition, a jump from 
small to very large foreward closures at some a,Q rlt (r). 
ag"* vanishes for r < r cr i t . This demonstrates unam- 
biguously the existence of a phase transition in catalytic 
random systems. 



A. Analytical approximation of the foreward 
closure size 
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(b) 



Let us define c t = Aa t +i/ Aa t . As an approximation 
imagine for a moment that c t is a constant in time c. 
Then for the forward closure 
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«f) 



1 -c 



(16) 



A choice c which produces the right asymptotical results 
is 



c = r ( 1 — 



(17) 



which is seen by first dividing the right part of Eq. I|15|) 
by Aat and then taking the t — ► oo limit. Now use the 
Ansatz 



1 - c 



c = r ( 1 



(18) 



which leads to a polynomial of third degree aj^ — aj^d- 



Xa c - 



0, or in terms of c 



2c z + c(l + r-) + r- --1 =0 
d d \d 



(19) 



Substituting c = £+2/3 reduces to equation x 3 +px+q 
with 



If a 

p = -sl 1 - 3r d 



<l 



27 3 d 



1 a / a \ 



(20) 




initial support a^ 







pair density r 



FIG. 2: Plot of the solutions cioo- (a), aoo+ (b) and both of 
the above in one plot (c). It is clear that the pase transition 
happens as the solutions switch from <2oo+ to aoo-. 



so that Cardano's method can be used with x = y + z to 
get 



3y 



, (21) 



providing us with three solutions y + z, p 2 y + pz and 
py + p 2 z with p = exp(27rz/3). We are only interested in 
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the real solution y + z which yields the final result 

a oc = —^— with c = 2/3 + y + z > (22) 
1 — c 

which is a good approximation of the true solution of 
the forward equation Eq. (|15fl and is plotted in Fig. 
(b) in the ag-r plane. The comparison with Fig. 1 (a) 
demonstrates the quality of the analytical solution. 

The analytical form of the solution allows us to un- 
derstand the origin of the phase transition. For this 
reason we observe that we can compute by solving 
c = r(l — a 00 )a 0Cl providing us with two solutions 

aoo± ^( 1± \A^) • (23) 

These solutions are shown in Figs. 121(a) and (b). It be- 
comes obvious that the phase transition actually always 
takes place by the system switching from the one solution 
to the other at Oq"* (r) . Below a critical value of r cr u the 
one solution smoothely passes into the other. We observe 
a crossover of the probable and the improbable solution. 
Most likely this is due to a convexity argument, implying 
the monotonicity of the size of the forward closure with 
respect to the size of the initial condition. The size of 
the forward closure will not shrink with increasing the 
initial set size and therefore has to jump to the alterna- 
tive solution at some critical line. As long as the third 
order polynomial is strictly monotonic we have a unique 
real solution for the zero of the polynomial. At r cr , t the 
polynomial starts to have a local minimum and maximum 
and we have in fact two relevant real zeros (out of three), 
the large and the small solution. r cr n is determined by 
the tripple zero of the polynomial. Note a famous anal- 
ogy here: Just as in the Vanderwaal's gas we can draw 
Maxwell lines. At the phase-transition, the small solu- 
tion becomes instable, the large solution becomes sta- 
ble. Here the productive pair density r takes the role 
of temperature in the Vanderwaal's gas. Below the crit- 
ical value a system freezes into a small set of durable 
species, while above the critical production law density, 
supercritical but yet small sets of individuals evaporate 
into their foreward closure. The difference we have here 
compared to Vanderwaal's gas is that the liquid phase 
is concentrated on the high edge of the phase transition, 
while the phase-plane itself is a solid-liquid mixture. 

V. DISCUSSION AND OUTLOOK 

We have developed a way to map a conceptual system 
of autocatalytic agents into a quantitative framework. 
With this it is possible to show that the combination of 
the initial number of products ao and the density of mu- 
tual production rules r is crucially influencing the mode 



of growth of sets of products. Our main result is that 
we were able to map the class of random catalytic net- 
works into a set of growth equations, which allows us to 
study the expectation value of the final number of prod- 
ucts. The resulting equation can be solved and shows a 
clear phase transition from practically unpopulated zones 
towords almost fully populated ones, in 'rule density- 
initial number of products' space. The transition is a 
crossover between two sets of solutions to a quadratic 
equation. 

We believe that this nicely resembles numerical- 
experimental findings of 7], where the authors find a 
decrease of species with an increase of p e i in their Fig. 
1 (d). The fact that the transition from small to full 
forward closure sizes is gradual, is either due to the lim- 
ited system size (10 species), or that the initial support 
was high, i.e. above a,Q r%t (r), so that no sharp increase 
could be found. In fact, systems of the type of Eq. Q 
are straight forwardly solvable numerically up to system 
sizes of about d = 100 within reasonable computing time. 
We note that the practical value of this present work is 
that it captures the large system size limit, which is out 
of numerical reach and was not tractable analytically be- 
fore. 

The present approach does not try to explain the 
detailed role of auto catalytic cycles nor of keynode 
species in the context of understanding the beginning 
of an explosion of species numbers as was very nicely 
done for example in [3, 0J • The periods of fast extinction 
reported there are not incorporated in our consider- 
ation yet, since we have not taken any evolutionatry 
hypotheses into account so far, nor did we incorporate 
evolutionary concepts in the sense that products are 
associated characteristics (some sort of fitness, e.g. 
the weight Xk) upon which they can get selected by 
some method. Taking these arguments into account 
seems a natural starting point for future research. Wc 
believe that it should be possible that a combination of 
backward and forward closure arguments can be used 
to estimate a critical density of auta catalytic cycles 
necessary for a system to become critical as studied 
here. We have not so-far considered negative entries in 
the interaction matrices, which should also be present 
in natural or social systems. Finally, we mention that 
our arguments given here should - due to the absence of 
a characteristic scale in the update equations - not be 
limited to finite sets. 
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